227 research outputs found

    Parameterizing Topic Models for Empirical Research

    Get PDF
    Machine learning techniques have been increasingly employed in business research to discover or extract new simple features from large and unstructured data. These machine learned features (MLFs) are then used as independent or explanatory variables in the main econometric models for empirical research. Despite this growing trend, there has been little research regarding the impact of using MLFs on statistical inference for empirical research. In this paper, we undertake parameter estimation issues related to the use of topics/features extracted by Latent Dirichlet Allocation, a popular machine learning technique for text mining. We propose a novel method to extract features that result in the minimum-variance estimation of the regression model parameters. This enables a better use of unstructured text data for econometric modeling in empirical research. The effectiveness of the proposed method is validated with an experimental evaluation study on real-world text data

    Feature Selection with Cost Constraint

    Get PDF
    When acquiring consumer data for marketing or new business initiatives, it is important to decide what features of potential customers should be acquired. We study feature selection and acquisition problem with cost constraint in the context of regression prediction. We formulate the feature selection and acquisition problem as a nonlinear programming problem that minimizes prediction error and number of features used in the model subject to a budget constraint. We derive the analytical properties of the solution for this problem and provide a computational procedure for solving the problem. The results of a preliminary experiment demonstrate the effectiveness of our approach

    A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

    Full text link
    Tracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over time. In particular, it is still difficult for state-of-the-art human trackers to recover complete human trajectories in crowded scenes with frequent human interactions. In this work, we consider the visibility status of a subject as a fluent variable, whose change is mostly attributed to the subject's interaction with the surrounding, e.g., crossing behind another object, entering a building, or getting into a vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the causal-effect relations between an object's visibility fluent and its activities, and develop a probabilistic graph model to jointly reason the visibility fluent change (e.g., from visible to invisible) and track humans in videos. We formulate this joint task as an iterative search of a feasible causal graph structure that enables fast search algorithm, e.g., dynamic programming method. We apply the proposed method on challenging video sequences to evaluate its capabilities of estimating visibility fluent changes of subjects and tracking subjects of interests over time. Results with comparisons demonstrate that our method outperforms the alternative trackers and can recover complete trajectories of humans in complicated scenarios with frequent human interactions.Comment: accepted by CVPR 201

    Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search

    Full text link
    Mobile landmark search (MLS) recently receives increasing attention for its great practical values. However, it still remains unsolved due to two important challenges. One is high bandwidth consumption of query transmission, and the other is the huge visual variations of query images sent from mobile devices. In this paper, we propose a novel hashing scheme, named as canonical view based discrete multi-modal hashing (CV-DMH), to handle these problems via a novel three-stage learning procedure. First, a submodular function is designed to measure visual representativeness and redundancy of a view set. With it, canonical views, which capture key visual appearances of landmark with limited redundancy, are efficiently discovered with an iterative mining strategy. Second, multi-modal sparse coding is applied to transform visual features from multiple modalities into an intermediate representation. It can robustly and adaptively characterize visual contents of varied landmark images with certain canonical views. Finally, compact binary codes are learned on intermediate representation within a tailored discrete binary embedding model which preserves visual relations of images measured with canonical views and removes the involved noises. In this part, we develop a new augmented Lagrangian multiplier (ALM) based optimization method to directly solve the discrete binary codes. We can not only explicitly deal with the discrete constraint, but also consider the bit-uncorrelated constraint and balance constraint together. Experiments on real world landmark datasets demonstrate the superior performance of CV-DMH over several state-of-the-art methods
    • …